Skip to content

Conversation

@sdesmalen-arm
Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm commented Apr 4, 2025

This tries to reland #123632 (previously reverted by commit
6b1db79)

This PR aims to fix coalescing of SUBREG_TO_REG when sub-register
liveness tracking is enabled and this is now the so-manieth
reincarnation of this effort :)

This change is needed in order to enable subreg liveness tracking for
AArch64, because without the implicit-def, Machine Copy Propagation
would remove a 'redundant' copy because it doesn't realise that the
top 32-bits of the register are zeroed, which subsequent instructions
rely on.

Changes compared to previous PR:

  • Rather than updating all instructions that define the source register
    (SrcReg) of the SUBREG_TO_REG, this new approach only updates instructions
    that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges
    are updated accordingly.

@llvmbot
Copy link
Member

llvmbot commented Apr 4, 2025

@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-regalloc
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

I had to previously revert #123632 due to failures on X86 and it took me a while before I had the time to get back to this.

This PR tries to reland the original patch, with additional fixes. The PR is structured as follows:

  • The git reverted patch (with tests updated)
  • A fix to only add the implicit-def when tracking subreg-liveness of the destination register.
  • A fix to only add the implicit-def when the destination register is not dead.
  • Updated tests after latest rebase.

The PR depends on #131361, which was split off as a separate PR.


Patch is 141.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134408.diff

31 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+70-16)
  • (modified) llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll (+5-5)
  • (added) llvm/test/CodeGen/AArch64/reduced-coalescer-issue.ll (+51)
  • (added) llvm/test/CodeGen/AArch64/register-coalesce-implicit-def-subreg-to-reg.mir (+30)
  • (modified) llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir (+55-3)
  • (modified) llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll (+3-4)
  • (modified) llvm/test/CodeGen/AMDGPU/fptosi.f16.ll (+5-5)
  • (modified) llvm/test/CodeGen/AMDGPU/fptoui.f16.ll (+5-5)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.maximum.f16.ll (+11-12)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.minimum.f16.ll (+11-12)
  • (modified) llvm/test/CodeGen/AMDGPU/load-constant-i16.ll (+6-10)
  • (modified) llvm/test/CodeGen/AMDGPU/select.f16.ll (+53-55)
  • (modified) llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll (+6-6)
  • (modified) llvm/test/CodeGen/PowerPC/aix-vec_insert_elt.ll (+4)
  • (modified) llvm/test/CodeGen/PowerPC/build-vector-tests.ll (+48)
  • (modified) llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll (+6)
  • (modified) llvm/test/CodeGen/PowerPC/combine-fneg.ll (+1)
  • (modified) llvm/test/CodeGen/PowerPC/fp-strict-round.ll (+6)
  • (modified) llvm/test/CodeGen/PowerPC/frem.ll (+3)
  • (modified) llvm/test/CodeGen/PowerPC/handle-f16-storage-type.ll (+1)
  • (modified) llvm/test/CodeGen/PowerPC/ldexp.ll (+2)
  • (modified) llvm/test/CodeGen/PowerPC/llvm.modf.ll (+1)
  • (modified) llvm/test/CodeGen/PowerPC/vec_insert_elt.ll (+4)
  • (modified) llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll (+176)
  • (added) llvm/test/CodeGen/X86/coalescer-breaks-subreg-to-reg-liveness.ll (+185)
  • (added) llvm/test/CodeGen/X86/coalescer-subreg-to-reg-implicit-def-regression.mir (+62)
  • (added) llvm/test/CodeGen/X86/coalescing-subreg-to-reg-requires-subrange-update.mir (+47)
  • (added) llvm/test/CodeGen/X86/pr76416.ll (+79)
  • (modified) llvm/test/CodeGen/X86/subreg-fail.mir (+2-2)
  • (added) llvm/test/CodeGen/X86/subreg-to-reg-coalescing.mir (+372)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index dbd354f2ca2c4..963f5620d8dba 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -306,7 +306,11 @@ class RegisterCoalescer : private LiveRangeEdit::Delegate {
   /// number if it is not zero. If DstReg is a physical register and the
   /// existing subregister number of the def / use being updated is not zero,
   /// make sure to set it to the correct physical subregister.
-  void updateRegDefsUses(Register SrcReg, Register DstReg, unsigned SubIdx);
+  ///
+  /// If \p IsSubregToReg, we are coalescing a DstReg = SUBREG_TO_REG
+  /// SrcReg. This introduces an implicit-def of DstReg on coalesced users.
+  void updateRegDefsUses(Register SrcReg, Register DstReg, unsigned SubIdx,
+                         bool IsSubregToReg);
 
   /// If the given machine operand reads only undefined lanes add an undef
   /// flag.
@@ -1444,6 +1448,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
 
   // CopyMI may have implicit operands, save them so that we can transfer them
   // over to the newly materialized instruction after CopyMI is removed.
+  LaneBitmask NewMIImplicitOpsMask;
   SmallVector<MachineOperand, 4> ImplicitOps;
   ImplicitOps.reserve(CopyMI->getNumOperands() -
                       CopyMI->getDesc().getNumOperands());
@@ -1458,6 +1463,9 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
               (MO.getSubReg() == 0 && MO.getReg() == DstOperand.getReg())) &&
              "unexpected implicit virtual register def");
       ImplicitOps.push_back(MO);
+      if (MO.isDef() && MO.getReg().isVirtual() &&
+          MRI->shouldTrackSubRegLiveness(DstReg))
+        NewMIImplicitOpsMask |= MRI->getMaxLaneMaskForVReg(MO.getReg());
     }
   }
 
@@ -1500,14 +1508,11 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
       } else {
         assert(MO.getReg() == NewMI.getOperand(0).getReg());
 
-        // We're only expecting another def of the main output, so the range
-        // should get updated with the regular output range.
-        //
-        // FIXME: The range updating below probably needs updating to look at
-        // the super register if subranges are tracked.
-        assert(!MRI->shouldTrackSubRegLiveness(DstReg) &&
-               "subrange update for implicit-def of super register may not be "
-               "properly handled");
+        // If lanemasks need to be tracked, compile the lanemask of the NewMI
+        // implicit def operands to avoid subranges for the super-regs from
+        // being removed by code later on in this function.
+        if (MRI->shouldTrackSubRegLiveness(MO.getReg()))
+          NewMIImplicitOpsMask |= MRI->getMaxLaneMaskForVReg(MO.getReg());
       }
     }
   }
@@ -1531,7 +1536,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
     MRI->setRegClass(DstReg, NewRC);
 
     // Update machine operands and add flags.
-    updateRegDefsUses(DstReg, DstReg, DstIdx);
+    updateRegDefsUses(DstReg, DstReg, DstIdx, false);
     NewMI.getOperand(0).setSubReg(NewIdx);
     // updateRegDefUses can add an "undef" flag to the definition, since
     // it will replace DstReg with DstReg.DstIdx. If NewIdx is 0, make
@@ -1607,7 +1612,8 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
           CurrIdx.getRegSlot(NewMI.getOperand(0).isEarlyClobber());
       VNInfo::Allocator &Alloc = LIS->getVNInfoAllocator();
       for (LiveInterval::SubRange &SR : DstInt.subranges()) {
-        if ((SR.LaneMask & DstMask).none()) {
+        if ((SR.LaneMask & DstMask).none() &&
+            (SR.LaneMask & NewMIImplicitOpsMask).none()) {
           LLVM_DEBUG(dbgs()
                      << "Removing undefined SubRange "
                      << PrintLaneMask(SR.LaneMask) << " : " << SR << "\n");
@@ -1872,7 +1878,7 @@ void RegisterCoalescer::addUndefFlag(const LiveInterval &Int, SlotIndex UseIdx,
 }
 
 void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
-                                          unsigned SubIdx) {
+                                          unsigned SubIdx, bool IsSubregToReg) {
   bool DstIsPhys = DstReg.isPhysical();
   LiveInterval *DstInt = DstIsPhys ? nullptr : &LIS->getInterval(DstReg);
 
@@ -1892,6 +1898,14 @@ void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
     }
   }
 
+  // If DstInt already has a subrange for the unused lanes, then we shouldn't
+  // create duplicate subranges when we update the interval for unused lanes.
+  LaneBitmask DefinedLanes;
+  if (DstInt && MRI->shouldTrackSubRegLiveness(DstReg)) {
+    for (LiveInterval::SubRange &SR : DstInt->subranges())
+      DefinedLanes |= SR.LaneMask;
+  }
+
   SmallPtrSet<MachineInstr *, 8> Visited;
   for (MachineRegisterInfo::reg_instr_iterator I = MRI->reg_instr_begin(SrcReg),
                                                E = MRI->reg_instr_end();
@@ -1915,6 +1929,9 @@ void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
     if (DstInt && !Reads && SubIdx && !UseMI->isDebugInstr())
       Reads = DstInt->liveAt(LIS->getInstructionIndex(*UseMI));
 
+    bool FullDef = true;
+    bool DeadDef = false;
+
     // Replace SrcReg with DstReg in all UseMI operands.
     for (unsigned Op : Ops) {
       MachineOperand &MO = UseMI->getOperand(Op);
@@ -1922,8 +1939,11 @@ void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
       // Adjust <undef> flags in case of sub-register joins. We don't want to
       // turn a full def into a read-modify-write sub-register def and vice
       // versa.
-      if (SubIdx && MO.isDef())
+      if (SubIdx && MO.isDef()) {
         MO.setIsUndef(!Reads);
+        FullDef = false;
+        DeadDef = MO.isDead();
+      }
 
       // A subreg use of a partially undef (super) register may be a complete
       // undef use now and then has to be marked that way.
@@ -1956,6 +1976,35 @@ void RegisterCoalescer::updateRegDefsUses(Register SrcReg, Register DstReg,
         MO.substVirtReg(DstReg, SubIdx, *TRI);
     }
 
+    if (IsSubregToReg && !FullDef && !DeadDef) {
+      // If the coalesed instruction doesn't fully define the register, we need
+      // to preserve the original super register liveness for SUBREG_TO_REG.
+      //
+      // We pretended SUBREG_TO_REG was a regular copy for coalescing purposes,
+      // but it introduces liveness for other subregisters. Downstream users may
+      // have been relying on those bits, so we need to ensure their liveness is
+      // captured with a def of other lanes.
+      //
+      // The implicit-def only needs adding if we track subregister liveness
+      // for this register, otherwise there is no point.
+
+      if (DstInt && MRI->shouldTrackSubRegLiveness(DstReg)) {
+        assert(DstInt->hasSubRanges() &&
+               "SUBREG_TO_REG should have resulted in subrange");
+        LaneBitmask DstMask = MRI->getMaxLaneMaskForVReg(DstInt->reg());
+        LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(SubIdx);
+        LaneBitmask UnusedLanes = DstMask & ~UsedLanes & ~DefinedLanes;
+        if ((UnusedLanes).any()) {
+          BumpPtrAllocator &Allocator = LIS->getVNInfoAllocator();
+          DstInt->createSubRangeFrom(Allocator, UnusedLanes, *DstInt);
+          DefinedLanes |= UnusedLanes;
+        }
+
+        MachineInstrBuilder MIB(*MF, UseMI);
+        MIB.addReg(DstReg, RegState::ImplicitDefine);
+      }
+    }
+
     LLVM_DEBUG({
       dbgs() << "\t\tupdated: ";
       if (!UseMI->isDebugInstr())
@@ -2157,6 +2206,8 @@ bool RegisterCoalescer::joinCopy(
     });
   }
 
+  const bool IsSubregToReg = CopyMI->isSubregToReg();
+
   ShrinkMask = LaneBitmask::getNone();
   ShrinkMainRange = false;
 
@@ -2226,9 +2277,12 @@ bool RegisterCoalescer::joinCopy(
 
   // Rewrite all SrcReg operands to DstReg.
   // Also update DstReg operands to include DstIdx if it is set.
-  if (CP.getDstIdx())
-    updateRegDefsUses(CP.getDstReg(), CP.getDstReg(), CP.getDstIdx());
-  updateRegDefsUses(CP.getSrcReg(), CP.getDstReg(), CP.getSrcIdx());
+  if (CP.getDstIdx()) {
+    assert(!IsSubregToReg && "can this happen?");
+    updateRegDefsUses(CP.getDstReg(), CP.getDstReg(), CP.getDstIdx(), false);
+  }
+  updateRegDefsUses(CP.getSrcReg(), CP.getDstReg(), CP.getSrcIdx(),
+                    IsSubregToReg);
 
   // Shrink subregister ranges if necessary.
   if (ShrinkMask.any()) {
diff --git a/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll b/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
index 0f208f8ed9052..374def5d3cdb6 100644
--- a/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
+++ b/llvm/test/CodeGen/AArch64/implicit-def-subreg-to-reg-regression.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
-; RUN: llc -aarch64-min-jump-table-entries=4 -mtriple=arm64-apple-ios < %s | FileCheck %s
+; RUN: llc -aarch64-min-jump-table-entries=4 -mtriple=arm64-apple-ios -enable-subreg-liveness=false < %s | sed -e "/; kill: /d" | FileCheck %s
+; RUN: llc -aarch64-min-jump-table-entries=4 -mtriple=arm64-apple-ios -enable-subreg-liveness=true  < %s | FileCheck %s
 
 ; Check there's no assert in spilling from implicit-def operands on an
 ; IMPLICIT_DEF.
@@ -92,7 +93,6 @@ define void @widget(i32 %arg, i32 %arg1, ptr %arg2, ptr %arg3, ptr %arg4, i32 %a
 ; CHECK-NEXT:    ldr x8, [sp, #40] ; 8-byte Folded Reload
 ; CHECK-NEXT:    mov x0, xzr
 ; CHECK-NEXT:    mov x1, xzr
-; CHECK-NEXT:    ; kill: def $w8 killed $w8 killed $x8 def $x8
 ; CHECK-NEXT:    str x8, [sp]
 ; CHECK-NEXT:    bl _fprintf
 ; CHECK-NEXT:    brk #0x1
diff --git a/llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll b/llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll
index 2a77d4dd33fe5..4206c0bc26991 100644
--- a/llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll
+++ b/llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_darwin.ll
@@ -27,11 +27,12 @@ define i32 @caller() nounwind ssp {
 ; CHECK-NEXT:    sub sp, sp, #208
 ; CHECK-NEXT:    mov w8, #10 ; =0xa
 ; CHECK-NEXT:    mov w9, #9 ; =0x9
-; CHECK-NEXT:    mov w10, #8 ; =0x8
+; CHECK-NEXT:    mov w0, #1 ; =0x1
 ; CHECK-NEXT:    stp x9, x8, [sp, #24]
-; CHECK-NEXT:    mov w8, #7 ; =0x7
+; CHECK-NEXT:    mov w8, #8 ; =0x8
 ; CHECK-NEXT:    mov w9, #6 ; =0x6
-; CHECK-NEXT:    mov w0, #1 ; =0x1
+; CHECK-NEXT:    str x8, [sp, #16]
+; CHECK-NEXT:    mov w8, #7 ; =0x7
 ; CHECK-NEXT:    mov w1, #2 ; =0x2
 ; CHECK-NEXT:    mov w2, #3 ; =0x3
 ; CHECK-NEXT:    mov w3, #4 ; =0x4
@@ -46,8 +47,7 @@ define i32 @caller() nounwind ssp {
 ; CHECK-NEXT:    stp x22, x21, [sp, #160] ; 16-byte Folded Spill
 ; CHECK-NEXT:    stp x20, x19, [sp, #176] ; 16-byte Folded Spill
 ; CHECK-NEXT:    stp x29, x30, [sp, #192] ; 16-byte Folded Spill
-; CHECK-NEXT:    stp x8, x10, [sp, #8]
-; CHECK-NEXT:    str x9, [sp]
+; CHECK-NEXT:    stp x9, x8, [sp]
 ; CHECK-NEXT:    bl _callee
 ; CHECK-NEXT:    ldp x29, x30, [sp, #192] ; 16-byte Folded Reload
 ; CHECK-NEXT:    ldp x20, x19, [sp, #176] ; 16-byte Folded Reload
diff --git a/llvm/test/CodeGen/AArch64/reduced-coalescer-issue.ll b/llvm/test/CodeGen/AArch64/reduced-coalescer-issue.ll
new file mode 100644
index 0000000000000..942b408b5f39c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/reduced-coalescer-issue.ll
@@ -0,0 +1,51 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -enable-subreg-liveness=false  < %s | FileCheck %s
+; RUN: llc -enable-subreg-liveness=true < %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+target triple = "aarch64-unknown-linux-gnu"
+
+define void @_ZN4llvm5APInt6divideEPKmjS2_jPmS3_(i32 %lhsWords, i32 %rhsWords) {
+; CHECK-LABEL: _ZN4llvm5APInt6divideEPKmjS2_jPmS3_:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    lsl w9, w0, #1
+; CHECK-NEXT:    mov w10, #1 // =0x1
+; CHECK-NEXT:    mov w8, w0
+; CHECK-NEXT:    mov w0, #1 // =0x1
+; CHECK-NEXT:    sub w9, w9, w1, lsl #1
+; CHECK-NEXT:    bfi w0, w8, #1, #31
+; CHECK-NEXT:    lsr w9, w9, #1
+; CHECK-NEXT:    bfi w10, w9, #2, #30
+; CHECK-NEXT:    cmp w10, #0
+; CHECK-NEXT:    b.hs .LBB0_2
+; CHECK-NEXT:  // %bb.1: // %if.then15
+; CHECK-NEXT:    lsl x8, x0, #2
+; CHECK-NEXT:    ldr xzr, [x8]
+; CHECK-NEXT:    ret
+; CHECK-NEXT:  .LBB0_2:
+; CHECK-NEXT:    b _Znam
+  %mul = shl i32 %rhsWords, 1
+  %mul1 = shl i32 %lhsWords, 1
+  %sub = sub i32 %mul1, %mul
+  %add7 = or i32 %mul1, 1
+  %idxprom = zext i32 %add7 to i64
+  %mul3 = shl i32 %sub, 1
+  %add4 = or i32 %mul3, 1
+  %1 = icmp ult i32 %add4, 0
+  br i1 %1, label %if.then15, label %3
+
+common.ret:                                       ; preds = %3, %if.then15
+  ret void
+
+if.then15:                                        ; preds = %0
+  %idxprom12 = zext i32 %add7 to i64
+  %arrayidx13 = getelementptr [128 x i32], ptr null, i64 0, i64 %idxprom12
+  %2 = load volatile ptr, ptr %arrayidx13, align 8
+  br label %common.ret
+
+3:                                                ; preds = %0
+  %call = tail call ptr @_Znam(i64 %idxprom)
+  br label %common.ret
+}
+
+declare ptr @_Znam(i64)
diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-implicit-def-subreg-to-reg.mir b/llvm/test/CodeGen/AArch64/register-coalesce-implicit-def-subreg-to-reg.mir
new file mode 100644
index 0000000000000..678d76527fa81
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-implicit-def-subreg-to-reg.mir
@@ -0,0 +1,30 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64 -start-before=register-coalescer -stop-after=virtregrewriter -enable-subreg-liveness=false -o - %s | FileCheck %s --check-prefix=SRLT
+# RUN: llc -mtriple=aarch64 -start-before=register-coalescer -stop-after=virtregrewriter -enable-subreg-liveness=true -o - %s | FileCheck %s --check-prefix=NOSRLT
+---
+name: test
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $x1
+    ; SRLT-LABEL: name: test
+    ; SRLT: liveins: $x1
+    ; SRLT-NEXT: {{  $}}
+    ; SRLT-NEXT: renamable $x0 = COPY $x1
+    ; SRLT-NEXT: renamable $w1 = ORRWrr $wzr, renamable $w0, implicit-def $x1
+    ; SRLT-NEXT: RET_ReallyLR implicit $x1, implicit $x0
+    ;
+    ; NOSRLT-LABEL: name: test
+    ; NOSRLT: liveins: $x1
+    ; NOSRLT-NEXT: {{  $}}
+    ; NOSRLT-NEXT: renamable $x0 = COPY $x1
+    ; NOSRLT-NEXT: renamable $w1 = ORRWrr $wzr, renamable $w0, implicit-def renamable $x1
+    ; NOSRLT-NEXT: RET_ReallyLR implicit $x1, implicit $x0
+    %190:gpr64 = COPY killed $x1
+    %191:gpr32 = COPY %190.sub_32:gpr64
+    %192:gpr32 = ORRWrr $wzr, killed %191:gpr32
+    %193:gpr64all = SUBREG_TO_REG 0, killed %192:gpr32, %subreg.sub_32
+    $x0 = COPY killed %190:gpr64
+    $x1 = COPY killed %193:gpr64all
+    RET_ReallyLR implicit $x1, implicit $x0
+...
diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
index 08fc47d9480ce..abf739fb9095e 100644
--- a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -7,8 +7,8 @@
 # CHECK-DBG: ********** JOINING INTERVALS ***********
 # CHECK-DBG: ********** INTERVALS **********
 # CHECK-DBG: %0 [16r,32r:0) 0@16r  weight:0.000000e+00
-# CHECK-DBG: %3 [48r,112r:0) 0@48r  L0000000000000040 [48r,112r:0) 0@48r  weight:0.000000e+00
-# CHECK-DBG: %4 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  L0000000000000080 [112e,112d:0) 0@112e  L0000000000000040 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  weight:0.000000e+00
+# CHECK-DBG: %3 [48r,112r:0) 0@48r  L0000000000000080 [48r,112r:0) 0@48r  L0000000000000040 [48r,112r:0) 0@48r  weight:0.000000e+00
+# CHECK-DBG: %4 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  L0000000000000080 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  L0000000000000040 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  weight:0.000000e+00
 # CHECK-DBG: %5 [32r,112r:1)[112r,112d:0) 0@112r 1@32r  weight:0.000000e+00
 ---
 name:            test
@@ -43,7 +43,7 @@ body:             |
 # CHECK-DBG: %1 [32r,48B:2)[48B,320r:0)[320r,368B:1) 0@48B-phi 1@320r 2@32r
 # CHECK-DBG-SAME: weight:0.000000e+00
 # CHECK-DBG: %3 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 3@304B-phi
-# CHECK-DBG-SAME: L0000000000000080 [288r,304B:0)[304B,320r:3) 0@288r 1@x 2@x 3@304B-phi
+# CHECK-DBG-SAME: L0000000000000080 [240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@x 3@304B-phi
 # CHECK-DBG-SAME: L0000000000000040 [80r,160B:2)[240r,272B:1)[288r,304B:0)[304B,320r:3) 0@288r 1@240r 2@80r 3@304B-phi
 # CHECK-DBG-SAME: weight:0.000000e+00
 ---
@@ -127,3 +127,55 @@ body:             |
     B %bb.1
 
 ...
+# Test that the interval `L0000000000000080 [112r,112d:1)` is not removed,
+# when removing undefined subranges.
+#
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: reproducer3
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: W0 [0B,32r:0)[320r,336r:1) 0@0B-phi 1@320r
+# CHECK-DBG: W1 [0B,16r:0) 0@0B-phi
+# CHECK-DBG: %0 [16r,64r:0) 0@16r  weight:0.000000e+00
+# CHECK-DBG: %1 [32r,128r:0) 0@32r  weight:0.000000e+00
+# CHECK-DBG: %2 [48r,64r:0) 0@48r  weight:0.000000e+00
+# CHECK-DBG: %3 [64r,80r:0) 0@64r  weight:0.000000e+00
+# CHECK-DBG: %4 [80r,176r:0) 0@80r  weight:0.000000e+00
+# CHECK-DBG: %7 [112r,128r:1)[128r,256r:0)[304B,320r:0) 0@128r 1@112r
+# CHECK-DBG-SAME: L0000000000000080 [112r,112d:1)[128r,256r:0)[304B,320r:0) 0@128r 1@112r
+# CHECK-DBG-SAME: L0000000000000040 [112r,128r:1)[128r,256r:0)[304B,320r:0) 0@128r 1@112r
+# CHECK-DBG-SAME: weight:0.000000e+00
+# CHECK-DBG: %8 [96r,176r:1)[176r,192r:0) 0@176r 1@96r  weight:0.000000e+00
+# CHECK-DBG: %9 [256r,272r:0) 0@256r  weight:0.000000e+00
+---
+name:            reproducer3
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $w0, $w1
+
+    %0:gpr32 = COPY killed $w1
+    %1:gpr32 = COPY killed $w0
+    %3:gpr32 = UBFMWri %1, 31, 30
+    %4:gpr32 = SUBWrs killed %3, killed %0, 1
+    %5:gpr32 = UBFMWri killed %4, 1, 31
+    %6:gpr32 = MOVi32imm 1
+    %7:gpr32 = COPY %6
+    %7:gpr32 = BFMWri %7, killed %1, 31, 30
+    %8:gpr64 = SUBREG_TO_REG 0, killed %7, %subreg.sub_32
+    %9:gpr32common = COPY killed %6
+    %9:gpr32common = BFMWri %9, killed %5, 30, 29
+    dead $wzr = SUBSWri killed %9, 0, 0, implicit-def $nzcv
+    Bcc 2, %bb.2, implicit killed $nzcv
+    B %bb.1
+
+  bb.1:
+    %10:gpr64common = UBFMXri killed %8, 62, 61
+    dead $xzr = LDRXui killed %10, 0
+    RET_ReallyLR
+
+  bb.2:
+    $x0 = COPY killed %8
+    RET_ReallyLR implicit killed $x0
+
+...
diff --git a/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll b/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
index c739ba2183ef9..86ef27a1522f5 100644
--- a/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
+++ b/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
@@ -329,11 +329,10 @@ define <2 x half> @chain_hi_to_lo_global() {
 ; GFX11-TRUE16:       ; %bb.0: ; %bb
 ; GFX11-TRUE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-TRUE16-NEXT:    v_mov_b32_e32 v0, 2
-; GFX11-TRUE16-NEXT:    v_mov_b32_e32 v1, 0
+; GFX11-TRUE16-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v2, 0
+; GFX11-TRUE16-NEXT:    v_mov_b32_e32 v3, 0
 ; GFX11-TRUE16-NEXT:    global_load_d16_b16 v0, v[0:1], off
-; GFX11-TRUE16-NEXT:    v_mov_b32_e32 v1, 0
-; GFX11-TRUE16-NEXT:    v_mov_b32_e32 v2, 0
-; GFX11-TRUE16-NEXT:    global_load_d16_hi_b16 v0, v[1:2], off
+; GFX11-TRUE16-NEXT:    global_load_d16_hi_b16 v0, v[2:3], off
 ; GFX11-TRUE16-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-TRUE16-NEXT:    s_setpc_b64 s[30:31]
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/fptosi.f16.ll b/llvm/test/CodeGen/AMDGPU/fptosi.f16.ll
index f84e14ea62273..d5f983c2f5648 100644
--- a/llvm/test/CodeGen/AMDGPU/fptosi.f16.ll
+++ b/llvm/test/CodeGen/AMDGPU/fptosi.f16.ll
@@ -328,13 +328,13 @@ define amdgpu_kernel void @fptosi_v2f16_to_v2i16(
 ; GFX11-TRUE16-NEXT:    buffer_load_b32 v0, off, s[8:11], 0
 ; GFX11-TRUE16-NEXT:    s_mov_b32 s5, s1
 ; GFX11-TRUE16-NEXT:    s_waitcnt vmcnt(0)
-; GFX11-TRUE16-NEXT:    v_l...
[truncated]

; GFX11-TRUE16-NEXT: v_mov_b32_e32 v1, 0
; GFX11-TRUE16-NEXT: v_mov_b32_e32 v2, 0
; GFX11-TRUE16-NEXT: global_load_d16_hi_b16 v0, v[1:2], off
; GFX11-TRUE16-NEXT: global_load_d16_hi_b16 v0, v[2:3], off
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@broxigarchen I noticed these tests changed, but I couldn't really tell whether these changes are functionally equivalent.

Copy link
Contributor

@broxigarchen broxigarchen Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sander, these changes seems good to me. Since I am not familiar with this pass, I will leave the approval to the other reviewers

@sdesmalen-arm
Copy link
Collaborator Author

@arsenm are you happy for me to reland this?

I've done better testing this time around; doing a two-stage build with sanitisers enabled and running LNT on both X86 and AArch64 platforms.

if (SubIdx && MO.isDef()) {
MO.setIsUndef(!Reads);
FullDef = false;
DeadDef = MO.isDead();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think dead flags are required to be accurate, it might be safer to check if LiveIntervals thinks it's really daed

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only if subreg liveness is enabled point is invalid. Downstream users are still invalid, it just happens subregister liveness disabled hides practical issues

@sdesmalen-arm sdesmalen-arm force-pushed the users/sdesmalen-arm/srlt-commute-implicit-def branch from 808ebbd to 8fd87e2 Compare July 9, 2025 16:17
@sdesmalen-arm sdesmalen-arm force-pushed the users/sdesmalen-arm/srlt-reland-pr-123632 branch from b9d9406 to 721aa45 Compare July 9, 2025 16:20
@sdesmalen-arm
Copy link
Collaborator Author

My apologies for taking a while to get back to this; I had been lacking focus time, and it also took me a while to get the changes right. This time I think it's in better shape, partly because I've got a better understanding of how things work and second because I've done more testing.
(a) I've built and ran the LLVM test-suite with subreg-liveness-tracking enabled on AArch64.
(b) I've done stage-2-builds with sanitizers (address + memory) on both AArch64 and X86 with subreg liveness tracking enabled for AArch64.

@sdesmalen-arm
Copy link
Collaborator Author

Gentle ping @arsenm and @qcolombet

I know that @arsenm is in favour of moving away from SUBREG_TO_REG entirely, but at the moment it is still used in many places by multiple targets and this PR fixes a genuine bug that is exposed with sub-reg liveness tracking.

…REG_TO_REG

This tries to reland #123632 (previously reverted by commit
6b1db79)

This PR aims to fix coalescing of SUBREG_TO_REG when sub-register
liveness tracking is enabled and this is now the so-manieth
reincarnation of this effort :)

This change is needed in order to enable subreg liveness tracking for
AArch64, because without the implicit-def, Machine Copy Propagation
would remove a 'redundant' copy because it doesn't realise that the
top 32-bits of the register are zeroed, which subsequent instructions
rely on.

Changes compared to previous PR:

* Rather than updating all instructions that define the source register
  (SrcReg) of the SUBREG_TO_REG, this new approach only updates instructions
  that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges
  are updated accordingly.
@sdesmalen-arm sdesmalen-arm requested a review from arsenm July 30, 2025 12:29
@sdesmalen-arm
Copy link
Collaborator Author

Apologies for the noise, I merely rebased the patch onto latest upstream/main, which required changing the base commit.

@sdesmalen-arm sdesmalen-arm merged commit bae8f13 into main Jul 30, 2025
9 checks passed
@sdesmalen-arm sdesmalen-arm deleted the users/sdesmalen-arm/srlt-reland-pr-123632 branch July 30, 2025 13:42
@nathanchance
Copy link
Member

I am seeing an assertion failure when building several different distribution configurations of the Linux kernel for x86_64.

# bad: [4a509f853fa4821ecdb0f6bc3b90ddd48794cc8c] [libc++] Implement comparison operators for `tuple` added in C++23 (#148799)
# good: [277bcf7ffc79e7d8652dc2c89ce79535b405635a] [ELF][AsmPrinter] Emit trailing dot for constant pool section when it has a hotness prefix (#150859)
git bisect start '4a509f853fa4821ecdb0f6bc3b90ddd48794cc8c' '277bcf7ffc79e7d8652dc2c89ce79535b405635a'
# bad: [4ef92469ab341ac1bee39a9413ffaa845e307414] [libc++][hardening] Add a greppable prefix to assertion messages. (#150560)
git bisect bad 4ef92469ab341ac1bee39a9413ffaa845e307414
# bad: [75e5a705771315bc40d2188675d08cb2c71f6933] [MLIR] Migrate some conversion passes and dialects to LDBG() macro (NFC) (#151349)
git bisect bad 75e5a705771315bc40d2188675d08cb2c71f6933
# good: [20293ebd3159b3964c4466e6ee04d3e9b721eac0] [LLVM][CodeGen][SME] Only emit strided loads in streaming mode. (#150445)
git bisect good 20293ebd3159b3964c4466e6ee04d3e9b721eac0
# bad: [3d4f1fee48689465b5026f75414247307db7d34d] [mlir][spirv] Fix UpdateVCEPass to deduce the correct set of capabilities (#151108)
git bisect bad 3d4f1fee48689465b5026f75414247307db7d34d
# good: [1d6e68e63aa28783ad0de7d0b46238ce95849f2f] [analyzer] Conversion to CheckerFamily: NSOrCFErrorDerefChecker (#151171)
git bisect good 1d6e68e63aa28783ad0de7d0b46238ce95849f2f
# good: [635e6d76530328b8412fbf985708dad26e3f8ea5] [analyzer] Fix FP for cplusplus.placement new #149240 (#150161)
git bisect good 635e6d76530328b8412fbf985708dad26e3f8ea5
# good: [fcbbcffd2e6ea30097809ba0cd1e3b6003fa862f] [llvm-objcopy] [COFF] Ignore associative sections in executables (#151143)
git bisect good fcbbcffd2e6ea30097809ba0cd1e3b6003fa862f
# good: [17c1921b4a78b2ab3f455278c2a057d164863866] [mlir][spirv] Add support for structs decorations (#149793)
git bisect good 17c1921b4a78b2ab3f455278c2a057d164863866
# bad: [bae8f1336db6a7f3288a7dcf253f2d484743b257] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408)
git bisect bad bae8f1336db6a7f3288a7dcf253f2d484743b257
# first bad commit: [bae8f1336db6a7f3288a7dcf253f2d484743b257] Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408)

cvise spits out:

int __outb_port, __snd_audiopci_probe___trans_tmp_11;
short __outl_port;
static int index;
char id[1];
struct ensoniq {
  int cssr;
} __snd_audiopci_probe_ensoniq;
int *__snd_audiopci_probe_card;
int snd_devm_card_new(int, char *, int **);
void snd_card_rw_proc_new(void(), void());
void snd_ensoniq_proc_read();
int __snd_audiopci_probe() {
  static int dev;
  int err = snd_devm_card_new(index, &id[dev], &__snd_audiopci_probe_card);
  struct ensoniq __trans_tmp_8 = __snd_audiopci_probe_ensoniq,
                 ensoniq = __trans_tmp_8;
  char value = 0;
  asm("out"
      " %"
      "0, %w1"
      :
      : "a"(value), "Nd"(__outb_port));
  {
    int value = ensoniq.cssr;
    asm("out"
        " %"
        "0, %w1"
        :
        : "a"(value), "Nd"(__outl_port));
  }
  snd_card_rw_proc_new(snd_ensoniq_proc_read, 0);
  __snd_audiopci_probe___trans_tmp_11 = err;
  return 0;
}
$ clang --target=x86_64-linux-gnu -O2 -c -o /dev/null ens1370.i
clang: llvm/lib/CodeGen/RegisterPressure.cpp:1171: void llvm::RegPressureTracker::getUpwardPressureDelta(const MachineInstr *, PressureDiff &, RegPressureDelta &, ArrayRef<PressureChange>, ArrayRef<unsigned int>) const: Assertion `(PDiffI->getUnitInc() >= 0) == (PNew >= POld) && "PSet overflow/underflow"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang --target=x86_64-linux-gnu -O2 -c -o /dev/null ens1370.i
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'ens1370.i'.
4.      Running pass 'Machine Instruction Scheduler' on function '@__snd_audiopci_probe'
 #0 0x0000559069ae68b8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (clang-22+0x34ae8b8)
 #1 0x0000559069ae3ff5 llvm::sys::RunSignalHandlers() (clang-22+0x34abff5)
 #2 0x0000559069a65946 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f919583e540 (/usr/lib/libc.so.6+0x3e540)
 #4 0x00007f919589894c (/usr/lib/libc.so.6+0x9894c)
 #5 0x00007f919583e410 raise (/usr/lib/libc.so.6+0x3e410)
 #6 0x00007f919582557a abort (/usr/lib/libc.so.6+0x2557a)
 #7 0x00007f91958254e3 __assert_perror_fail (/usr/lib/libc.so.6+0x254e3)
 #8 0x0000559069235abe llvm::RegPressureTracker::getUpwardPressureDelta(llvm::MachineInstr const*, llvm::PressureDiff&, llvm::RegPressureDelta&, llvm::ArrayRef<llvm::PressureChange>, llvm::ArrayRef<unsigned int>) const (clang-22+0x2bfdabe)
 #9 0x00005590690cd502 llvm::GenericScheduler::initCandidate(llvm::GenericSchedulerBase::SchedCandidate&, llvm::SUnit*, bool, llvm::RegPressureTracker const&, llvm::RegPressureTracker&) (clang-22+0x2a95502)
#10 0x00005590690cdb50 llvm::GenericScheduler::pickNodeFromQueue(llvm::SchedBoundary&, llvm::GenericSchedulerBase::CandPolicy const&, llvm::RegPressureTracker const&, llvm::GenericSchedulerBase::SchedCandidate&) (clang-22+0x2a95b50)
#11 0x00005590690cea83 llvm::GenericScheduler::pickNode(bool&) (clang-22+0x2a96a83)
#12 0x00005590690c4752 llvm::ScheduleDAGMILive::schedule() (clang-22+0x2a8c752)
#13 0x00005590690bc56a llvm::impl_detail::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) (clang-22+0x2a8456a)
#14 0x00005590690bbf4e llvm::impl_detail::MachineSchedulerImpl::run(llvm::MachineFunction&, llvm::TargetMachine const&, llvm::impl_detail::MachineSchedulerImpl::RequiredAnalyses const&) (clang-22+0x2a83f4e)
#15 0x00005590690d23e6 (anonymous namespace)::MachineSchedulerLegacy::runOnMachineFunction(llvm::MachineFunction&) MachineScheduler.cpp:0:0
#16 0x0000559069049623 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (clang-22+0x2a11623)
#17 0x00005590695e6b05 llvm::FPPassManager::runOnFunction(llvm::Function&) (clang-22+0x2faeb05)
#18 0x00005590695ee742 llvm::FPPassManager::runOnModule(llvm::Module&) (clang-22+0x2fb6742)
#19 0x00005590695e74c0 llvm::legacy::PassManagerImpl::run(llvm::Module&) (clang-22+0x2faf4c0)
#20 0x000055906a2268d3 clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (clang-22+0x3bee8d3)
#21 0x000055906a23c655 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (clang-22+0x3c04655)
#22 0x000055906b8a87e9 clang::ParseAST(clang::Sema&, bool, bool) (clang-22+0x52707e9)
#23 0x000055906a7b36e6 clang::FrontendAction::Execute() (clang-22+0x417b6e6)
#24 0x000055906a7221ed clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (clang-22+0x40ea1ed)
#25 0x000055906a8854cc clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (clang-22+0x424d4cc)
#26 0x00005590688055f7 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (clang-22+0x21cd5f7)
#27 0x000055906880149f ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#28 0x000055906a587269 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#29 0x0000559069a6562e llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (clang-22+0x342d62e)
#30 0x000055906a586aa3 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (clang-22+0x3f4eaa3)
#31 0x000055906a547cfc clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (clang-22+0x3f0fcfc)
#32 0x000055906a547f17 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (clang-22+0x3f0ff17)
#33 0x000055906a564698 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (clang-22+0x3f2c698)
#34 0x0000559068800d43 clang_main(int, char**, llvm::ToolContext const&) (clang-22+0x21c8d43)
#35 0x0000559068811457 main (clang-22+0x21d9457)
#36 0x00007f9195827675 (/usr/lib/libc.so.6+0x27675)
#37 0x00007f9195827729 __libc_start_main (/usr/lib/libc.so.6+0x27729)
#38 0x00005590687fef25 _start (clang-22+0x21c6f25)
clang: error: clang frontend command failed with exit code 134 (use -v to see invocation)

llvm-reduce spits out the following from the reproducer above:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i32 @__snd_audiopci_probe() {
entry:
  %call = tail call i32 null(i32 0, ptr null, ptr null)
  %__trans_tmp_8.sroa.0.0.copyload = load i32, ptr null, align 4
  tail call void asm sideeffect "out $0, ${1:w}", "{ax},N{dx},~{dirflag},~{fpsr},~{flags}"(i8 0, i32 0)
  %0 = load i16, ptr null, align 2
  tail call void asm sideeffect "out $0, ${1:w}", "{ax},N{dx},~{dirflag},~{fpsr},~{flags}"(i32 %__trans_tmp_8.sroa.0.0.copyload, i16 %0)
  ret i32 0
}
$ llc -o /dev/null reduced.ll
llc: llvm/lib/CodeGen/RegisterPressure.cpp:1171: void llvm::RegPressureTracker::getUpwardPressureDelta(const MachineInstr *, PressureDiff &, RegPressureDelta &, ArrayRef<PressureChange>, ArrayRef<unsigned int>) const: Assertion `(PDiffI->getUnitInc() >= 0) == (PNew >= POld) && "PSet overflow/underflow"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -o /dev/null reduced.ll
1.      Running pass 'Function Pass Manager' on module 'reduced.ll'.
2.      Running pass 'Machine Instruction Scheduler' on function '@__snd_audiopci_probe'
 #0 0x000056083bb094c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (llc+0x32574c8)
 #1 0x000056083bb06b95 llvm::sys::RunSignalHandlers() (llc+0x3254b95)
 #2 0x000056083bb0a271 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f90c603e540 (/usr/lib/libc.so.6+0x3e540)
 #4 0x00007f90c609894c (/usr/lib/libc.so.6+0x9894c)
 #5 0x00007f90c603e410 raise (/usr/lib/libc.so.6+0x3e410)
 #6 0x00007f90c602557a abort (/usr/lib/libc.so.6+0x2557a)
 #7 0x00007f90c60254e3 __assert_perror_fail (/usr/lib/libc.so.6+0x254e3)
 #8 0x000056083ad6fade llvm::RegPressureTracker::getUpwardPressureDelta(llvm::MachineInstr const*, llvm::PressureDiff&, llvm::RegPressureDelta&, llvm::ArrayRef<llvm::PressureChange>, llvm::ArrayRef<unsigned int>) const (llc+0x24bdade)
 #9 0x000056083abc6452 llvm::GenericScheduler::initCandidate(llvm::GenericSchedulerBase::SchedCandidate&, llvm::SUnit*, bool, llvm::RegPressureTracker const&, llvm::RegPressureTracker&) (llc+0x2314452)
#10 0x000056083abc6aa0 llvm::GenericScheduler::pickNodeFromQueue(llvm::SchedBoundary&, llvm::GenericSchedulerBase::CandPolicy const&, llvm::RegPressureTracker const&, llvm::GenericSchedulerBase::SchedCandidate&) (llc+0x2314aa0)
#11 0x000056083abc79d3 llvm::GenericScheduler::pickNode(bool&) (llc+0x23159d3)
#12 0x000056083abbd3e2 llvm::ScheduleDAGMILive::schedule() (llc+0x230b3e2)
#13 0x000056083abb520a llvm::impl_detail::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) (llc+0x230320a)
#14 0x000056083abb4bee llvm::impl_detail::MachineSchedulerImpl::run(llvm::MachineFunction&, llvm::TargetMachine const&, llvm::impl_detail::MachineSchedulerImpl::RequiredAnalyses const&) (llc+0x2302bee)
#15 0x000056083abcb456 (anonymous namespace)::MachineSchedulerLegacy::runOnMachineFunction(llvm::MachineFunction&) MachineScheduler.cpp:0:0
#16 0x000056083aadd083 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (llc+0x222b083)
#17 0x000056083b0696a5 llvm::FPPassManager::runOnFunction(llvm::Function&) (llc+0x27b76a5)
#18 0x000056083b071482 llvm::FPPassManager::runOnModule(llvm::Module&) (llc+0x27bf482)
#19 0x000056083b06a0f0 llvm::legacy::PassManagerImpl::run(llvm::Module&) (llc+0x27b80f0)
#20 0x000056083a1756fe compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x000056083a172edd main (llc+0x18c0edd)
#22 0x00007f90c6027675 (/usr/lib/libc.so.6+0x27675)
#23 0x00007f90c6027729 __libc_start_main (/usr/lib/libc.so.6+0x27729)
#24 0x000056083a16ee65 _start (llc+0x18bce65)

@scamp-nvidia
Copy link

Hi @sdesmalen-arm - I wonder if your change is causing the issues I've reported in #151768 and #151592 that affect two different ARM nodes (neoverse-v2 and neoverse-v1). Both are showing signs of calling to the RegisterCoalescer and hitting assertion issues with SUBREG_TO_REG parts.

sdesmalen-arm added a commit that referenced this pull request Aug 4, 2025
… when coalescing SUBREG_TO_REG" (#134408)"

This reverts commit bae8f13.

Some issues were found:
* #151768
* #151592
* #134408 (comment)
* #151888 (comment)

I'll revert this for the time being while I investigate.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 4, 2025
…er register when coalescing SUBREG_TO_REG" (#134408)"

This reverts commit bae8f13.

Some issues were found:
* llvm/llvm-project#151768
* llvm/llvm-project#151592
* llvm/llvm-project#134408 (comment)
* llvm/llvm-project#151888 (comment)

I'll revert this for the time being while I investigate.
@omjavaid
Copy link
Contributor

omjavaid commented Aug 4, 2025

FYI
This broke various LLVM testsuite buildbots for AArch64 SVE, but the problem got masked because relevant buildbots were already failing due to other breakage.

https://lab.llvm.org/buildbot/#/builders/4/
https://lab.llvm.org/buildbot/#/builders/17/
https://lab.llvm.org/buildbot/#/builders/41
https://lab.llvm.org/buildbot/#/builders/143

@gregbedwell
Copy link
Collaborator

FYI, we spotted the following assert on x86_64 (which disappeared after ed5bd23). Probably the same issue as reported above, but just in case I'm including it here as the testcase is quite different.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-unknown"

define { i32, i1 } @_Z1iv() {
entry:
  %call = tail call ptr null(ptr null)
  %0 = cmpxchg ptr null, i8 0, i8 1 monotonic monotonic, align 1
  %1 = cmpxchg ptr null, i32 0, i32 0 monotonic monotonic, align 4
  ret { i32, i1 } %1
}
greg@GREG-WIN11:~/git/llvm-project/build$ ./bin/llc ~/reduce/reduced.ll
llc: /home/greg/git/llvm-project/llvm/lib/CodeGen/RegisterPressure.cpp:1170: void llvm::RegPressureTracker::getUpwardPressureDelta(const llvm::MachineInstr*, llvm::PressureDiff&, llvm::RegPressureDelta&, llvm::ArrayRef<llvm::PressureChange>, llvm::ArrayRef<unsigned int>) const: Assertion `(PDiffI->getUnitInc() >= 0) == (PNew >= POld) && "PSet overflow/underflow"' failed.

MacDue added a commit to MacDue/llvm-project that referenced this pull request Sep 16, 2025
…remat

Currently, something like:

```
$eax = MOV32ri -11, implicit-def $rax
%al = COPY $eax
```

Can be rematerialized as:
```
dead $eax = MOV32ri -11, implicit-def $rax
```

Which marks the full $rax as used, not just $al.

With this change, this is rematerialized as:

```
dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al
```

To indicate that only $al is used. This issue is latent right now, but
is exposed when llvm#134408 is applied, as it results in the register
pressure being incorrectly calculated.

I think this change is in line with past fixes in this area, notably:
llvm@059cead
llvm@69cd121
MacDue added a commit that referenced this pull request Sep 26, 2025
…remat (#159110)

Currently, something like:

```
$eax = MOV32ri -11, implicit-def $rax
%al = COPY $eax
```

Can be rematerialized as:
```
dead $eax = MOV32ri -11, implicit-def $rax
```

Which marks the full $rax as used, not just $al.

With this change, this is rematerialized as:

```
dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al
```

To indicate that only $al is used. 

Note: This issue is latent right now, but is exposed when #134408 is
applied, as it results in the register pressure being incorrectly
calculated (unless this patch is applied too).

I think this change is in line with past fixes in this area, notably:

059cead

69cd121
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…remat (llvm#159110)

Currently, something like:

```
$eax = MOV32ri -11, implicit-def $rax
%al = COPY $eax
```

Can be rematerialized as:
```
dead $eax = MOV32ri -11, implicit-def $rax
```

Which marks the full $rax as used, not just $al.

With this change, this is rematerialized as:

```
dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al
```

To indicate that only $al is used. 

Note: This issue is latent right now, but is exposed when llvm#134408 is
applied, as it results in the register pressure being incorrectly
calculated (unless this patch is applied too).

I think this change is in line with past fixes in this area, notably:

llvm@059cead

llvm@69cd121
sdesmalen-arm added a commit that referenced this pull request Nov 17, 2025
…register when coalescing SUBREG_TO_REG" (#134408)""

This reverts commit ed5bd23.
sdesmalen-arm added a commit that referenced this pull request Nov 17, 2025
The register coalescer ran into some asserts:

* The newly added code tried to get the LiveInterval for a physical
  register (unguarded path)

* The assert 'assert(SubregToRegSrcInsts.empty() && "can this happen?");'
  could happen when using SUBREG_TO_REG to say that the top bits
  of the second register in a {128, 128} register tuple are zero, e.g.

  %8.qsub1:qq = MOVIv2d_ns 0
  %4:zpr = SUBREG_TO_REG 0, %8.qsub1:qq, %subreg.zsub
sdesmalen-arm added a commit that referenced this pull request Nov 24, 2025
…register when coalescing SUBREG_TO_REG" (#134408)""

This reverts commit ed5bd23.
sdesmalen-arm added a commit that referenced this pull request Nov 24, 2025
The register coalescer ran into some asserts:

* The newly added code tried to get the LiveInterval for a physical
  register (unguarded path)

* The assert 'assert(SubregToRegSrcInsts.empty() && "can this happen?");'
  could happen when using SUBREG_TO_REG to say that the top bits
  of the second register in a {128, 128} register tuple are zero, e.g.

  %8.qsub1:qq = MOVIv2d_ns 0
  %4:zpr = SUBREG_TO_REG 0, %8.qsub1:qq, %subreg.zsub
sdesmalen-arm added a commit that referenced this pull request Nov 24, 2025
…alescing SUBREG_TO_REG"

A SUBREG_TO_REG instruction expresses that the top bits of the result
register are set to a certain value (e.g. 0).

The example below expresses that the result of %1 will have the top 32
bits zeroed and the lower 32bits being equal to the result of INSTR.
```
    %0:gpr32 = INSTR
    %1:gpr64 = SUBREG_TO_REG 0, %0, sub32
```
When the RegisterCoalescer tries to remove SUBREG_TO_REG instructions by
coalescing %0 into %1, it must keep the same semantics. Currently
however, the RegisterCoalescer would emit:
```
    %1.sub32:gpr64 = INSTR
```
which no longer expresses that the top 32-bits of the register are
defined (zeroed) by INSTR.

This may cause issues with e.g. machine copy propagation where the pass
may think it can remove a COPY-like instruction because the MIR says
only the bottom 32-bits are defined/used, even though other uses of the
register rely on the top 32-bits being zeroed by the COPY-like
instruction.

This PR changes the RegisterCoalescer to instead emit:
```
    undef %1.sub32:gpr64 = MOVimm32 42, implicit-def %1
```
to express that the entire contents of %1:gpr64 are defined by the
instruction.

This tries to reland #134408 which had to be reverted due to a few reported
failures.
@gregbedwell
Copy link
Collaborator

Hello,
I've just raised #169485 which bisects to the reland of this in bb78728
Please can you take a look?

Thanks!

aadeshps-mcw pushed a commit to aadeshps-mcw/llvm-project that referenced this pull request Nov 26, 2025
…alescing SUBREG_TO_REG"

A SUBREG_TO_REG instruction expresses that the top bits of the result
register are set to a certain value (e.g. 0).

The example below expresses that the result of %1 will have the top 32
bits zeroed and the lower 32bits being equal to the result of INSTR.
```
    %0:gpr32 = INSTR
    %1:gpr64 = SUBREG_TO_REG 0, %0, sub32
```
When the RegisterCoalescer tries to remove SUBREG_TO_REG instructions by
coalescing %0 into %1, it must keep the same semantics. Currently
however, the RegisterCoalescer would emit:
```
    %1.sub32:gpr64 = INSTR
```
which no longer expresses that the top 32-bits of the register are
defined (zeroed) by INSTR.

This may cause issues with e.g. machine copy propagation where the pass
may think it can remove a COPY-like instruction because the MIR says
only the bottom 32-bits are defined/used, even though other uses of the
register rely on the top 32-bits being zeroed by the COPY-like
instruction.

This PR changes the RegisterCoalescer to instead emit:
```
    undef %1.sub32:gpr64 = MOVimm32 42, implicit-def %1
```
to express that the entire contents of %1:gpr64 are defined by the
instruction.

This tries to reland llvm#134408 which had to be reverted due to a few reported
failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants